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We examine the sources of error in the histogram reweighting method for Monte Carlo data analysis. 
We demonstrate that, in addition to the standard statistical error which has been studied elsewhere, 
there are two other sources of error, one arising through correlations in the reweighted samples, and 
one arising from the finite range of energies sampled by a simulation of finite length. We demonstrate 
that while the former correction is usually negligible by comparison with statistical fluctuations, the 
latter may not be, and give criteria for judging the range of validity of histogram extrapolations 
based on the size of this latter correction. 



I. INTRODUCTION 

Monte Carlo simulations have a long and interesting 
history. As a tool for studying physical systems (rather 
than for performing integrals), they date back at least as 
far as the pioneering work on neutron diffusion by En- 
rico Fermi in the 1930s |Q, but Monte Carlo methods 
really came to prominence in the fifties following the cal- 
culations on hard-sphere gases and other simple systems 
performed by Ulam, Metropolis, von Neumann and oth- 
ers using the early digital computers at Aberdeen and 
Los Alamos §]. In the last three decades, with the avail- 
ability of ever-increasing amounts of computer power, the 
Monte Carlo method has become one of the most impor- 
tant tools in the statistical physicist's tool-box. 

Although the name Monte Carlo covers a multitude 
of different ideas and techniques, we concentrate in this 
paper on the simulation of classical models in thermal 
equilibrium. All equilibrium Monte Carlo calculations re- 
volve around the same fundamental idea. One generates 
a number of states i = 1 ... n of the system of interest and 
measures for each one the total energy Ei and any other 
quantities of interest Xi, Yi, etc. Normally all states i are 
not generated equally probably, but with varying prob- 
abilities pi, a technique known as importance sampling. 
The best estimate of the thermal average (X) of a quan- 
tity X is then given by 



(X) = 



J2i Xip-^-P^ 



(i) 



where (3 = (fcT)" 1 is the inverse temperature and k is 
the Boltzmann constant. In some cases, particularly in 
systems which display symmetry- or ergodicity-breaking, 
we may not in fact wish to calculate an average over all 
states in this way j|. For the purposes of this paper 
however, we assume that we are working with ergodic 
systems for which expectations of the form (^) are phys- 
ically meaningful. 

The most common choice by far for the probabilities 



Pi is to make them proportional to the Boltzmann weight 
of the corresponding state at the temperature of interest 

(2) 

in which case Equation ([!]) reduces to a simple average 
over the measurements Xi . Many other choices have been 
investigated however, including simple or uniform sam- 
pling ||] in which pi is a constant independent of i, en- 
tropic sampling Q in which pi is proportional to the re- 
ciprocal of the density of states at energy Ei, and 1/k 
sampling Q in which pi is proportional to the reciprocal 
of the integrated density of states. In the present paper 
we investigate the case in which the states are sampled 
with probabilities proportion to their Boltzmann weights, 
but at a temperature To different from the temperature 
at which we wish to calculate (X). In other words, we 
imagine performing a normal thermal Monte Carlo sim- 
ulation at a temperature To, and then ask for the best 
estimate of the expectation of A at a different tempera- 
ture T. Making the replacement (3 — ► /3q in Equation (^) 
and substituting into (|l|), we obtain 



(X) 



(3) 



This is not a new result. Already in 1972, Valleau and 
Card H] pointed out that it is possible in theory to ex- 
tract a value for (X) at any temperature from the results 
of a single thermal Monte Carlo simulation using an equa- 
tion of this type. Their results were rediscovered and 
extended in 1988 by Ferrenberg and Swendsen ||, who 
dubbed this technique the "single histogram method". 
The name is something of a misnomer, since the method's 
application does not necessarily involve the construction 
of any histograms. Ferrenberg and Swendsen's formula- 
tion however was in terms of histograms and, as we will 
see, it is often convenient to represent the method in this 
way. 

Defining the double histogram H(E, X) to be the num- 
ber of states i sampled for which Ei — E and Xi = X, 
we can rewrite Equation ([}]) in the form 
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(X) = 



J2 E ^ x XH(E,X)e-^-M E 
J2 EiX H(E ) X)e-(0-^)E ■ 



If we define a set of weights 

W(E, X) = H(E, X) e -i0-M E 



(4) 



(5) 



then Equation (||) can be rewritten as a weighted average 
over X: 



(X) = 



J2 E ^ X XW(E,X) 

J2e, x w(e,x) 



(6) 



Note that W(E, X) and H(E,X) become equal when 
f3 — [3q. In effect, W(E, X) is an estimate of the value of 
the histogram H(E,X) at the temperature of interest. 

It is possible to write an equation similar to (||) for pa- 
rameters other than the temperature, allowing us to ex- 
trapolate the results of a single simulation to other values 
of any external field appearing in the Hamiltonian. It is 
also straightforward to generalize the histogram method 
to non-Boltzmann sampling schemes. Here however we 
concentrate on the simple case described above. 

In this paper we explore the sources of error in his- 
togram extrapolations. The statistical errors inherent 
in the method have been discussed at some length else- 
where and it is not our intention to reproduce pre- 
vious results here. We focus instead on two important 
sources of error which have been neglected in previous 
studies. In Section [n] we discuss errors introduced as a 
result of the finite range of energies sampled in a simu- 
lation of finite length, and show that in certain temper- 
ature regimes this, and not statistical fluctuation, is the 
dominant source of error. In Section III we discuss er- 
rors introduced by the correlation between fluctuations 
in the numerator and denominator of Equation (|3|). In 
Section IV we discuss corrections to the normal expres- 
sion for the statistical errors arising from the previous 
analysis and show that to leading order these corrections 
are negligible. In Section [V] we give our conclusions. 



II. FINITE SAMPLE SIZE ERRORS 

Suppose that we perform a single Monte Carlo simu- 
lation at temperature To on some system of interest, and 
that this simulation samples n states of the system at in- 
tervals of t s Monte Carlo steps. We assume in this paper 
that t s is much greater than the correlation time r of 
the simulation algorithm used (also measured in Monte 
Carlo steps) so that the states may be considered to be 
statistically independent. More generally, if t s and r are 
comparable, then the variance in a measured quantity is 
increased by a factor of 1 + 2t/t s over its value for uncor- 
rected samples |10| |. All the results given in this paper 
can be generalized to this case in a straightforward man- 
ner; see Rcf. [pi for a thorough exploration of this issue. 
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FIG. 1. The weight function W{E) for a 32 x 32 Ising fer- 
romagnet on a square lattice in two dimensions, calculated 
at four different temperatures from a single simulation at the 
critical temperature T c = 2.269 of the infinite system. The 
curves shown are (left to right) for T = T c , 2.3, 2.4, and 2.6. 

In the limit of an infinite number of independent sam- 
ples, n — > oo, Equation (|^) is exact and correctly gives the 
value of (X) at all temperatures. In practice, however, n 
is always finite, and this limits the range over which the 
extrapolation is valid. In Figure ^ we show an example 
of the use of the single histogram method to calculate 
the internal energy of a two-dimensional Ising model in 
zero field. The case of the internal energy is particularly 
simple, since the weight function W(E, X) reduces in this 
case to a function W(E) of a single variable E, the energy 
of the states sampled in the simulation. The figure shows 
the calculated value of this function for a variety of differ- 
ent temperatures at distances increasingly far from the 
temperature To of the original simulation. For small de- 
viations from To the calculated value of W(E) is a good 
approximation to the histogram H(E) which would be 
generated by a simulation performed at temperature T. 
However as T strays farther from To, the value of W(E) 
becomes an increasingly poor representation of the cor- 
rect histogram, as can be seen in the figure. The source of 
this problem is clear: a finite- n Monte Carlo simulation 
samples energies in only a rather narrow range around 
the value U(Tq) of the equilibrium internal energy of the 
system at To. Extrapolation of the results to tempera- 
tures T for which the true histogram H{E) would possess 
significant contributions at energies outside this range is 
therefore guaranteed to give poor results. In the par- 
ticular case of the internal energy, it is clear that if the 
highest energy sampled by our simulation is E+ , then no 
reweighting of our histogram can ever produce an esti- 
mate of U(T) = (E) greater than E+, regardless of the 
true value. 

The usual rule of thumb for estimating the range of 
validity of the extrapolation is to require that the mean 
of the reweighted distribution W(E), which is just the 
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internal energy U(T), should be less than <te away from 
the mean U(Tq) of the histogram H(E), where <je is the 
standard deviation of H(E). Since ge is related to the 
specific heat C at T according to C(T ) = kf3 2 a E , we 
can also express this condition in terms of C(Tq) as 



[U(T) - U(T )} 2 < kT 2 C(T ) 



(7) 



Equation (]?]) can be simplified further if we make the 
derivative approximation 



U(T)-U(To)^(T-T )^ 



= ATC(T ) 



(8) 



where AT = T — To is the temperature range over which 
we are extrapolating. Employing this approximation, our 
condition becomes 



AT 



< 



c(T y 



(9) 



This condition is intuitively easy to understand and in 
most cases is a reasonable guide for applying the his- 
togram method. However, as we will demonstrate, the 
actual range of validity of the method can deviate arbi- 
trarily far from the value of AT given by Equation (^), 
depending on the number n of samples generated by the 
Monte Carlo simulation. 

We now construct a more accurate criterion for the ex- 
trapolation range. The basic idea is to make an estimate 
of the energy E + above which there are no samples, and 
then to approximate the error introduced into our ex- 
trapolation by assuming that the histogram is accurate 
up to E+, and contains no samples thereafter. We do 
the same for the lower limit EL of the histogram. A 
variation on this idea would be to restrict the extrapo- 
lation to a range of energies such that some prescribed 
fraction of the samples in the histogram fall within that 
range. However, since the tails of the histogram typically 
decay exponentially or faster, these two approaches give 
approximately the same results. 

Consider the ideal histogram H(E), which we define to 
be the value of the histogram H(E) averaged, bin by bin, 
over an infinite number of simulations which generate 
n samples each. We then approximate the histogram 
resulting from a single simulation by 



H(E) 



_ J (n/n')H(E) 




if E- < E < E+ 
otherwise. 



(10) 



The factor n/n', where ri = J^T H(E) dE, is a normal- 
izing factor which ensures that the integral of H(E) over 
E is correctly equal to n. The values of E + and Tv_ are 
defined naturally by 



H{E±) = a, 
where a is a constant of order unity. 



(11) 



Making this approximation, the extrapolated internal 
energy U (T) can be written as 



AU = U{T) - U(T) 

J E e <fl-fh)E Jj^e) dE JE e^-M E H(E) dE 



f e (P-M E H(E) dE 



d J^c-(' 3 -M E H(E)dE 
dp l0S J™ e -(P-ME W{E) dE 



J e (P-M E H(E) dE 
(12) 



In order to proceed we make a Gaussian approximation 
for H(E): 



H(E) 



exp 



[E-U(T )f 
2*% 



(13) 



This assumption is an excellent guide for the behavior 
of most systems at temperatures well above T = 0. For 
instance, in the Ising system of Figure [l it gives log H(E) 
within a few percent over more than a hundred orders of 
magnitude of H(E). 

Using Equation ( |l3| ) and another derivative approxi- 
mation: 



-03 -A 



dU 

d[3 



A) 



[/(To) - U(T), (14) 



we complete the square to obtain 

71 

/(/?) exp 



where 



/(/?) = exp 



[E-U(T)f 



2*1 



U 2 (T)-U 2 (T ) 



2a% 



(15) 



(16) 



is a shorthand for all the terms in the exponential which 
depend on f3 but not on E. Substituting Equation (|lF 
into (O) and performing the integral leads to 



AU = | log 



E- U(T) \ 
V2a E J 



-,E 4 



2cr| exp(— x + ) — exp(— x_) 



(17) 



where erf (a:) = ^ J** 
tion, and 



erf(x+) — erf(x_) 
e - * 2 dt is the Gaussian error func- 



x± 



E± - U(T) 



y/2, 



(JE 



= ±Jlog 



<J E AT 



2iraa E V2kTf 



(18) 



using Equations (§), © and@. 

Between them, Equations (|l7) and ( ]l8| ) give us an es- 
timate of the deviation of the extrapolation of U from its 
true value as a function of the number of samples n and 
the temperature range AT over which we extrapolate. 
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FIG. 2. The difference AU between the true internal en- 
ergy of a 100 x 100 Ising ferromagnet and an extrapolation 
using Equation (^) of the same quantity from simulations 
with n = 100 samples performed at a single temperature 
T = 2.269. The line is a fit using Equations (0) and @. 
Energies are in units of the coupling constant J, and may be 
compared to U(T ) = -1.4 x 10 4 . 

As a test of this calculation we have plotted in Figure |2| 
the value of AU measured in simulations of a 100 x 100 
Ising model on a square lattice in two dimensions. The 
data points with error bars show the difference between 
the true internal energy (obtained from further indepen- 
dent simulations) and those calculated via Equation (|^) 
from simulations with n = 100 samples at temperature 
To = 2.269 (the critical temperature of the infinite sys- 
tem). These points are averaged over 1000 repetitions 
of the simulation at T c . The solid line is from Equa- 
tions ( |l7| ) and (|lj) with the constant a chosen so as to 
best fit the data. As the figure shows, the agreement 
between the two is good. 

In a typical Monte Carlo calculation we want to know 
the range of temperature AT over which we can extrap- 
olate from a single histogram to a given degree of ac- 
curacy AU as a function of the sample size n. In the 
regime where U (T) approaches either of the limits E_ 
or E+, one or other of the terms on the top and bottom 
of Equation (Q) becomes a constant (either zero or one) 
and the variation in AU resides entirely in the remain- 
ing terms. In this case a line of constant AU is also a 
line of constant x+ or x_ (for AT positive or negative 
respectively) which means that 



±Jlog 



ct £ AT 



2ira E a V^kT^ 



(19) 



with the value of the constant b depending on the size of 
error AU we are willing to live with. Thus, for given AU, 
the temperature range AT over which the extrapolation 
is valid increases at most logarithmically with increasing 
sample size n. 
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FIG. 3. Inset: the difference between the true and extrap- 
olated internal energies of a 100 x 100 Ising ferromagnet for 
a variety of different sample sizes n. Main figure: the range 
AT over which the extrapolation is accurate to ±100, as a 
function of n. The points are the values from the simulations 
shown in the inset and the two solid lines are Equation (p"9|), 
taking the + and — signs separately. The upper curve and 
points are for positive AT, the lower ones for negative AT. 

In Figure || we demonstrate this formula for the 100 x 
100 two-dimensional Ising model. The inset shows ex- 
trapolations from the critical temperature of the infinite 
system for sample sizes n — 10, 20, 50, 100, 200, 500, 
and 1000, using Equation (|J). The errors in these results 
are comparable to the widths of the lines. The dashed 
lines show an arbitrarily-chosen deviation of AU — ±100 
from the true value as our limit of acceptable accuracy — 
a relative error of about 0.7%. The intersections of the 
solid curves and dashed lines give the ranges AT over 
which simulations with different n give acceptable results. 
The main figure shows these ranges as points with error 
bars, the upper points corresponding to values AT > 
(i.e., extrapolation above To), the lower ones to AT < 0. 
The solid lines are Equation ( |l9| ) with the constants a 
and b chosen by a least squares fit to the data. As the 
figure shows, simulation and theory are in good agree- 
ment. 

As an example of the use of Equation ([l9|), consider 
the results of Miinger and Novotny [ pd| who performed 
an extensive numerical study of the accuracy of the sin- 
gle histogram extrapolation method for the case of the 
q = 3 Potts ferromagnet in two dimensions. They con- 
cluded that the values of the specific heat predicted by 
the method show systematic deviations from the true val- 
ues, and presented evidence indicating that the size of 
these deviations decrease with increasing n. In fact, a 
simple application of Equations (|9|) and ([l9]) reveals im- 
mediately what the problem is. For the parameter values 
and sample sizes used in their calculations, the range over 
which they attempt to extrapolate satisfies the simple cri- 
terion (||), but falls outside the bounds of accuracy set 
by ©. 
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Miinger and Novotny deliberately performed simula- 
tions with small values of n in order to investigate the 
inaccuracies of the histogram method. However, in nor- 
mal use, the method is applied to simulations with large 
n, and in the region close to To where the deviation AU 
is small. We can characterize this regime as one in which 
|^±| 3> 1, in which case the value of the denominator in 
Equation (|l7| ) is close to 2 and the primary variation in 
AU comes from the Gaussians in the numerator: 



AU 



exp( 



-)]• 



(20) 



Since E+ and E— are symmetrically distributed about 
C/(To), we have x+(Tq) — — i_(Tq), and the two terms 
cancel to give AU — at T — T , as expected. The 
leading term in the expansion of AU about this point is 
linear in AT with coefficient 



dAU 



dT 



o u E 



/log 



1-KG 



2 „2- 



(21) 



Thus AU tends to zero roughly as 1/n to leading order, 
and the higher order terms vanish faster than this. As 
we will see in Section [V , the statistical errors in extrap- 
olated quantities fall off in the normal 1 / yfn fashion, so 
that in the region close to To, finite sample size errors 
always become negligible for sufficiently large n. 

On the other hand, when we get far away from T , the 
extrapolated value of U becomes roughly equal to E + or 
T_ (depending on the direction in which we extrapolate) 
and hence approximately independent of n, since E± only 
varies slowly with n. Thus the error AU is approximately 
n-independent in this regime and dominates over statis- 
tical errors for sufficiently large n. The point of crossover 
between the two regimes is given by Equation Jl9|). 

A similar argument can be made for the extrapola- 
tion of quantities other than the energy. The limiting 
extrapolated values of any quantity Y are set by the val- 
ues Y± corresponding to the highest and lowest energies 
sampled in the simulation, and since these energies are 
approximately n-independent, so normally will Y± be. 
Thus Equation ([l9]) tells us for any quantity Y the point 
of crossover at which errors due to the finite number of 
samples in the histogram become the dominant source of 
inaccuracy in the histogram method. 



III. DISTRIBUTION ERRORS 

There is another source of systematic error in the es- 
timates given by the single histogram method which has 
not, to our knowledge, been remarked upon before. Even 
ignoring the corrections discussed in the last section, 
which were due to the imperfect sampling of the his- 
togram H(E), Equation (|j) is not in fact a correct ex- 
pression for the best estimate of (X) for any finite n. 



To understand this, consider again the hypothetical sit- 
uation in which we perform a large number N of simu- 
lations of the system of interest, each one generating n 
statistically independent samples drawn from the Boltz- 
mann distribution at To. For each one we calculate an 
estimate 



(X)i = 



-((3-f3 )E t] 



Pi 



-(P-Po)E i: 



(22) 



where i = 1 ... N labels the different simulations and 
X^ is the value of X in the jth state sampled by the ith 
simulation. The new quantities P and Q will provide a 
convenient shorthand for the numerator and denominator 
of this equation. 

Now we want to compute the best estimate of (X) over 
all N simulations. Since the samples in each simulation 
were drawn from the same distribution, we can just as 
well regard them all as being one large set of samples of 
size nN drawn from a single simulation, in which case it 
is clear that in the limit of large N the correct answer for 
(X) is 



(X) = 



-(P-0 o )Eij p 



J2 -(P- MEii Q 



(23) 



where P and Q indicate the averages of P{ and Qi over 
all N simulations. (We use the barred notation to avoid 
confusion with the notation (X) for thermal expectation 
values.) This equation indicates that the best estimate of 
(X) is calculated by separately averaging the numerator 
and denominator of Equation ( p2| ) over our many simu- 
lations. In practice, one does not perform many simula- 
tions; one performs only one simulation with finite n and 
then calculates the ratio P/Q for that one simulation. 
The mean value of this ratio however is not the same as 
the ratio of the means, 
correct answer. 



Equation (23), which gives the 
This difference leads to a systematic er- 
ror in the predictions of the single histogram method for 
finite sample sizes. In this section we calculate the size 
of this error. 

Consider the double Taylor expansion of the quantity 
P/Q around P/Q: 



P 



P — 1 

= + (P-P = 
Q 1 Q 



(Q-Q)=2 



(Q-Q) 2 ^-(P-P)(Q-Q)^2 
Q Q 



(24) 



Taking the average of both sides over many repetitions 
of the simulation, the linear terms vanish and to leading 
order we are left with 



P 



i 



a% cov(P,Q) 



Q 2 



PQ 



(25) 



where <Tq is the variance of Q over simulations i and 
cov(P, Q) is the covariance of P and Q. Thus the mean 
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value of the quantity P/Q, which is the quantity mea- 
sured in our Monte Carlo calculations, differs from the 
true value of (X) — P/Q by the factor enclosed in the 
square brackets [...]. One should take this factor into 
account in order to correctly calculate the extrapolation 
of a quantity. 

Given that in a typical situation we only perform one 
simulation of our system, what is the best estimate we 
can make of this factor from our Monte Carlo results? 
Clearly the best estimates of P and Q are simply the 
values of P and Q measured in the simulation: P = P, 
Q = Q. The best estimates of the variance and covari- 
ance terms are 



1 



-2(/3-P )E j 



(26) 



and 



cov(P,Q) = -l T {^X j 



-2(/3-/3 )_E 3 



j2 e -(P-M E] y (27) 



Substituting these into Equation (B5|) we see that the 
correction term scales as 1/n with sample size. But, as 
shown below, statistical errors scale as 1/y/n and there- 
fore dominate for large n. Thus it should be safe to ignore 
errors of the type described by Equation ( p5| ) for simula- 
tions of sufficient length. 



4 = 



2 e -2( l 8-/3 )E j 



[E 



X -(13-ME, 



(29) 



it is clear that a\ scales as 1/n, and hence that ax scales 
as 1/ Y^n, as claimed earlier. This is a slower scaling than 
the 1/n of the previous section, but still much better 
than the approximately constant value of the finite sam- 
ple size error of Section [FJ for large extrapolation range 
AT. This means that we must use an equation such 
as ( p^ ) to decide which of these two latter sources of er- 
ror is the dominant one under given circumstances. 



V. CONCLUSIONS 

In this paper we have examined in detail the sources 
of error in the Monte Carlo extrapolation method known 
as the single histogram method. We have discussed three 
sources of error: finite sample size errors, systematic er- 
rors due to the approximations made in the calculation of 
the extrapolation, and finally statistical errors. The first 
two of these have not, to our knowledge been discussed 
previously, and in particular we find that the finite sam- 
ple size errors are, under commonly encountered condi- 
tions, significantly larger than either of the other sources 
of error. 
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IV. STATISTICAL ERRORS 

The third and final source of error which we consider is 
statistical fluctuation in the extrapolation due to the es- 
sential random nature of a Monte Carlo simulat ion. We 
can calculate the variance cr-^j of the quantity P/ Q by 



a technique similar to that used to derive Equation (25) 



we perform a Taylor expansion of P 2 /Q 2 about P/Q and 
take the average over many simulations. Then we calcu- 

2 

late the variance as cr^v^ = P 2 /Q 2 — P/Q . The vari- 



ance a x of the best estimate of (X) is then a^— times 



the square of the correction factor in Equation (12J 
leading order this gives 



' x 



(xy< 



_ °p , *q _ 9 cov(P,Q) 

~~ ==2 + — 2 Z " 



P~ 



Q 



PQ 



). To 



(28) 



This expression is identical to that given by Ferren- 
berg et al. , for the error on the uncorrected estimate 
P/Q. 

Using Equations ( p6| ) and (^), along with the obvious 
extension 
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